5,671 research outputs found
Distributed Machine Learning via Sufficient Factor Broadcasting
Matrix-parametrized models, including multiclass logistic regression and
sparse coding, are used in machine learning (ML) applications ranging from
computer vision to computational biology. When these models are applied to
large-scale ML problems starting at millions of samples and tens of thousands
of classes, their parameter matrix can grow at an unexpected rate, resulting in
high parameter synchronization costs that greatly slow down distributed
learning. To address this issue, we propose a Sufficient Factor Broadcasting
(SFB) computation model for efficient distributed learning of a large family of
matrix-parameterized models, which share the following property: the parameter
update computed on each data sample is a rank-1 matrix, i.e., the outer product
of two "sufficient factors" (SFs). By broadcasting the SFs among worker
machines and reconstructing the update matrices locally at each worker, SFB
improves communication efficiency --- communication costs are linear in the
parameter matrix's dimensions, rather than quadratic --- without affecting
computational correctness. We present a theoretical convergence analysis of
SFB, and empirically corroborate its efficiency on four different
matrix-parametrized ML models
Architectural Support for Efficient Communication in Future Microprocessors
Traditionally, the microprocessor design has focused on the computational aspects
of the problem at hand. However, as the number of components on a single chip
continues to increase, the design of communication architecture has become a crucial
and dominating factor in defining performance models of the overall system. On-chip
networks, also known as Networks-on-Chip (NoC), emerged recently as a promising
architecture to coordinate chip-wide communication.
Although there are numerous interconnection network studies in an inter-chip
environment, an intra-chip network design poses a number of substantial challenges
to this well-established interconnection network field. This research investigates designs
and applications of on-chip interconnection network in next-generation microprocessors
for optimizing performance, power consumption, and area cost. First,
we present domain-specific NoC designs targeted to large-scale and wire-delay dominated
L2 cache systems. The domain-specifically designed interconnect shows 38%
performance improvement and uses only 12% of the mesh-based interconnect. Then,
we present a methodology of communication characterization in parallel programs
and application of characterization results to long-channel reconfiguration. Reconfigured
long channels suited to communication patterns enhance the latency of the
mesh network by 16% and 14% in 16-core and 64-core systems, respectively. Finally,
we discuss an adaptive data compression technique that builds a network-wide frequent value pattern map and reduces the packet size. In two examined multi-core
systems, cache traffic has 69% compressibility and shows high value sharing among
flows. Compression-enabled NoC improves the latency by up to 63% and saves energy
consumption by up to 12%
Capturing scattered discriminative information using a deep architecture in acoustic scene classification
Frequently misclassified pairs of classes that share many common acoustic
properties exist in acoustic scene classification (ASC). To distinguish such
pairs of classes, trivial details scattered throughout the data could be vital
clues. However, these details are less noticeable and are easily removed using
conventional non-linear activations (e.g. ReLU). Furthermore, making design
choices to emphasize trivial details can easily lead to overfitting if the
system is not sufficiently generalized. In this study, based on the analysis of
the ASC task's characteristics, we investigate various methods to capture
discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear
activations in a deep neural network, and therefore, we apply an element-wise
comparison between different filters of a convolution layer's output. Two data
augment methods and two deep architecture modules are further explored to
reduce overfitting and sustain the system's discriminative power. Various
experiments are conducted using the detection and classification of acoustic
scenes and events 2020 task1-a dataset to validate the proposed methods. Our
results show that the proposed system consistently outperforms the baseline,
where the single best performing system has an accuracy of 70.4% compared to
65.1% of the baseline.Comment: Submitted to DCASE2020 worksho
A micromachined flow shear-stress sensor based on thermal transfer principles
Microhot-film shear-stress sensors have been developed by using surface micromachining techniques. The sensor consists of a suspended silicon-nitride diaphragm located on top of a vacuum-sealed cavity. A heating and heat-sensing element, made of polycrystalline silicon material, resides on top of the diaphragm. The underlying vacuum cavity greatly reduces conductive heat loss to the substrate and therefore increases the sensitivity of the sensor. Testing of the sensor has been conducted in a wind tunnel under three operation modes-constant current, constant voltage, and constant temperature. Under the constant-temperature mode, a typical shear-stress sensor exhibits a time constant of 72 ÎĽs
- …